Sparse modeling of neural network posterior probabilities for exemplar-based speech recognition
نویسندگان
چکیده
We study automatic speech recognition by direct use of acoustic features (exemplars) without any assumption on the underlying stochastic process. The prior studies exploit spectral exemplars. In this work, we present the use of neural network sub-word posterior probabilities as exemplars. The space of sub-word observations is lowdimensional (e.g. RK×T ) whereas the word transcription requires reconstructing a high-dimensional representation (e.g. RL×T , L K). Given the prior knowledge that for any given utterance, the word representation is highly sparse, we cast the speech recognition problem as sparse reconstruction of word posteriors given the compressed (low-dimensional) acoustic observation. The sub-word units (phones) are denoted by {qk}k=1. Given an input spectral feature xt at time t, a (deep) neural network [1] is used to estimate the posterior probabilities {p(qk|xt)}k=1. The phone posterior probabilities is related to the word posterior probabilities p(wl|xt) through
منابع مشابه
Sparse Hidden Markov Models for Exemplar-based Speech Recognition Using Deep Neural Network Posterior Features
Statistical speech recognition has been cast as a natural realization of the compressive sensing and sparse recovery. The compressed acoustic observations are sub-word posterior probabilities obtained from a deep neural network (DNN). Dictionary learning and sparse recovery are exploited for inference of the high-dimensional sparse word posterior probabilities. This formulation amounts to reali...
متن کاملSparse Hidden Markov Models for Exemplar-based Speech Recognition Using Posterior Features
Stochastic speech recognition has been cast as a natural realization of the compressive sensing problem in this work. The compressed acoustic observations are sub-word posterior probabilities obtained from a deep neural network. Dictionary learning and sparse recovery are exploited for inference of the high-dimensional sparse word posterior probabilities. This formulation amounts to realization...
متن کاملSparse Hidden Markov Models for Automatic Speech Recognition
Stochastic speech recognition has been cast as a natural realization of the compressive sensing problem in this work. The compressed acoustic observations are subword posterior probabilities obtained from a deep neural network. Dictionary learning and sparse recovery are exploited for inference of the high-dimensional sparse word posterior probabilities. This formulation amounts to realization ...
متن کاملEnhancing Exemplar-Based Posteriors for Speech Recognition Tasks
Posteriors generated from exemplar-based sparse representation methods are often learned to minimize reconstruction error of the feature vectors. These posteriors are not learned through a discriminative process linked to the word error rate (WER) objective of a speech recognition task. In this paper, we explore modeling exemplar-based posteriors to address this issue. We first explore posterio...
متن کاملExemplar-based Sparse Representation for Posterior Features
Posterior features have been shown to yield very good performance in multiple contexts including speech recognition, spoken term detection, and template matching. These days, posterior features are usually estimated at the output of a neural network. More recently, sparse representation has also been shown to potentially provide additional advantages to improve discrimination and robustness. On...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Speech Communication
دوره 76 شماره
صفحات -
تاریخ انتشار 2016